[TASK #7]: Advanced

Sentiment Analysis of Indian News and Stock Forecasting

Load SENSEX (S&P BSE SENSEX) data

Load INDIAN News Headlines Data

**************************** SENTIMENT ANALYSIS ON INDIAN NEWS HEADLINES ****************************

Predicting the Sentiments by Un-Supervised method

Algorithm for predicting Sentiments by Un-Supervised method

1. Preprocess the headlines to remove the punctuations, stop words and lemmatise the words. Save this pre processed headlines in clean_headlines.csv file 2. Train the CBOW (Continous Bag Of Words) model to create word embeddings of size 300. After training, each word is represented by 300 dimension vector. Hence, The vocabulary of whole headline corpus is in 300 Dimensions. Save the model as 'word2vec.model' 3. Using K means clustering, create two clusters. This way, all the words in 300 Dimensions, would be divided into 2 Cluster based on spatial similarity. 4. Determine the sentiment coefficient for each word in headline corpus. The sentiment coefficient is determined by 1. closeness score, which is score of closeness of a datapoint from their own cluster centroids and 2. cluster value, which is 1 or -1 based on if the cluster is positive or negative cluster 5. Create a dictionary of Sentiment Coefficient for each words. 6. Determine the TF-IDF value of each word and create the dictionary IF-IDF for each word 7. Replace the words in sentence with TF-IDF value from the above dictionary, to get the TF-IDF vector of the sentence. 8. Replace the words in sentence with sentiment coefficients created above, from Sentiment Coefficient dictionary to create another vector. 9. Take the dot product of both vectors to determine the sentiment rate. If the sentiment rate is positive, then sentiment is overall positive, otherwise, it is negatve

Import for NLP

Pre-processing news headlines

Save the cleaned headlines

Create and Train Word2Vec Model

Create the model

Train the Word2Vec Model

Save the word2vec model

Use K Mean model to create positive and negative clusters from word vectors

Divide the whole vector space into positive and negative clusters

Determine the Sentiment score of each word based on closeness with its cluster

Determine the TF-IDF score of each word

Create a dictionary of TF-IDF of each word and replace each word in sentence with TF-IDF score

Replace each word in sentence with sentiment coefficients

Predict Sentiment

*********************** TIME SERIES ANALYSIS OF SENSEX (S&P BSE SENSEX) STOCKS***********************

Algorithms for Time Series Forecasting

1. Time Forecasting for next day is done on 'Close' value for SENSEX (S&P BSE SENSEX) stocks 2. Split the data before 31st Dec 2019. Hence, the training data is before 31st Dec 2019 and prediction will be done for year 2020 starting from 1st January 2020 3. Normalize the Close values between value 0 and 1 for better training 4. Prepare and Train the RNN / LSTM model 5. Prepare the test data from the close values after 31st Dec 2019 6. Do predictions on ab0ve prepared the test data

ALGORITHM 1 : Stock Prediction using RNN

Get Time Series Series Data for 'Close'

Split the data

Normalize the train time-series data

Reshaping data as expected by RNN model in tensorflow 2.0

Prepare the RNN / LSTM model

Train the RNN / LSTM Model

Save the trained model

Prepare test data

Predict the Stock Price of 2020 after December 2019

Combined Visualisations from Stock Market and Sentiment Analysis Predictions